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In this work we explain how to properly use mean-held methods to solve the inverse Ising problem 
when the phase space is clustered, that is many states are present. The clustering of the phase space 
can occur for many reasons, e.g. when a system undergoes a phase transition, but also when data 
are collected in different regimes (e.g. quiescent and spiking regimes in neural networks). Mean-held 
methods for the inverse Ising problem are typically used without taking into account the eventual 
clustered structure of the input conhgurations and may lead to very poor inference (e.g. in the low 
temperature phase of the Curie-Weiss model). In this work we explain how to modify mean-held 


approaches when the phase space is clustered and 
on different clustered structures (low temperature 
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I. INTRODUCTION 

The Ising inverse problem has been the subject of a 
large amount of works recently m ■ Although this prob¬ 
lem is known since many decades under the name of 
Boltzmann machine learning (BML), many recent appli¬ 
cations and developments in different fields (e.g. biology 
EHE], computer science [5] and physics mm) have re¬ 
newed the interest in studying such problems. The BML 
can be investigated under two very different approaches. 

In the first one, which concerns this work, a set of data 
is generated according to the Gibbs-Boltzmann measure 
of a generic Ising model. The input data for the inverse 
problem are therefore independent and distributed ac¬ 
cordingly to the Boltzmann distribution of the system 
m In a second case, the data are generated according 
to a stochastic dynamical process which correlates conhg¬ 
urations close in time, and this correlations in the input 
data are exploited in solving the inverse problem jl4| . 

In both cases, the traditional Bayesian approach consists 
in maximizing the likelihood function of the data. In 
this work, we focus on the first case which is commonly 
named “static inverse Ising problem” and is harder than 
the second case. 

In the static case, maximizing the likelihood is a com¬ 
plicated task, because it directly depends on the partition 
function which is impossible to compute efficiently (in the 
general case, its complexity grows exponentially with the 
system size). However, it is still possible to maximize 
the likelihood by the expectation-maximization method 
using a Monte Carlo (MC) algorithm and doing a Boltz¬ 
mann learning procedure |l3] . The MC algorithm is used 
to evaluate the average value of the observables of the sys¬ 
tem (here the magnetizations and the correlations) and 
to update the value of the magnetic fields and the cou¬ 
plings by doing a gradient ascent. Yet, it is known that 
MC estimates do not converge quickly in many cases and 


we illustrate the effectiveness of the new method 
phases of Curie-Weiss and Hopfield models). 


may require many steps to obtain accurate mean val¬ 
ues. It means that the MC algorithm should be run for 
a long time at each step of the BML procedure making 
the method quite slow. For this reason, faster methods 
based on mean-field approximations are commonly used 
in practical applications. 

In a recent work m Nguyen and Berg have revisited 
the problem of finding a good mean-field (MF) approx¬ 
imation for the inverse Ising problem. It was already 
known that MF methods fail to provide a good cou¬ 
plings reconstruction at low temperatures even for fer¬ 
romagnetic systems (see Fig. [l] for an example on a fer- 
romagnet and [15] for an example on a MF spin glass). 
Worse than that, this problem in coupling reconstruc¬ 
tion occurs also in cases where the MF approximation 
is exact in the thermodynamical limit (e.g. the Curie- 
Weiss model). This failure in reconstructing couplings in 
ferromagnetic systems can be understood by looking at 
the input configurations at low temperatures: below the 
ferromagnetic transition, indeed, configurations are clus¬ 
tered in two groups of respectively positive and negative 
magnetization. The naive MF (nMF) approximation is 
based on the self-consistency equations for the magneti¬ 
zations, rrii — tanh(/3 ]Th with /3 being the inverse 

temperature, which have 3 solutions for /3 > /3 C : it is 
well known that the rrii = 0 solution is unphysical, while 
the two solutions with rrii ^ 0 are thermodynamically 
stable. However considering all the input configurations 
together the average magnetizations are zero by symme¬ 
try. Therefore, a naive use of MF equations infer the 
couplings using the unpliysical rrii = 0 fixed point, and 
lead to a very poor result. Please notice that the same 
problem arises if one computes correlations in a naive 
way: using all input data connected correlations would 
not decay at long distance. Therefore, in order to use 
properly the nMF equations, it is mandatory to look at 
the two other solutions characterized by non-zero magne- 
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tization. These solutions arise naturally when consider¬ 
ing the decomposition of the Gibbs-Boltzmann measure 
in the configuration space. 

The authors of Ref. El consider the nMF equations 
for both states (of positive and negative magnetizations) 
at the same time. In this way they obtain an over¬ 
constrained system of linear equations to be solved. They 
manage to find a solution by using the pseudo-inverse of 
a matrix (see E for further details). We will see that 
this approach can be considerably simplified in the case 
of the Curie-Weiss (CW) model, and then generalized to 
models with many free-energy minima. In Ref. E also 
the case of the Sherrington-Kirpatrick (SK) model is con¬ 
sidered as a case study with a clustered phase space at 
low temperatures. We would like to emphasize, however, 
that the division in metastable states of the SK model is 
somehow problematic for this approach. The metastable 
states of the SK model in the glassy phase are highly 
non-trivial and therefore it is very difficult even to define 
them properly in a system of limited size. Therefore we 
claim that the inference algorithm of Ref. E as well as 
the one presented in the present work are not suitable 
for this kind of models, for which more elaborate tech¬ 
niques (such as the pseudo-likelihood method [TO] TTj) 
are required. 

In the present work we show that couplings can be well 
inferred using nMF equations also in the low tempera¬ 
ture phase if input configurations are previously clustered 
and the nMF inference algorithm is applied separately to 
data in each cluster. We show that our inference proce¬ 
dure based on solving the nMF or TAP equations inside 
each cluster separately is much simpler than the method 
proposed in Ref. E , where self-consistency equations for 
each cluster need to be solved simultaneously. Therefore 
the use of complicated numerical algorithms such as the 
pseudo-inverse is not necessary. In addition, we show 
that, at variance to what is claimed in Ref. E , using 
the present inference procedure one does not estimate 
wrongly the magnetic fields. It is worth mentioning that, 
when using one of the MF fixed points with nii yf 0, a 
spurious magnetic field unavoidably appears due to errors 
on the inferred couplings. However this magnetic field is 
very small and decreases when increasing the number of 
input data. 

In order to prove that our method is very efficient 
we apply it to different kind of models. First we show 
that in the CW model the results are as good as those 
from more elaborated methods, like the pseudo-likelihood 
method. Then we focus on the Hopfield model where the 
number of different free-energy minima can be controlled 
and made larger. We show that it is possible to improve 
the results on the inference process by clustering the set 
of input configurations and to infer the right number of 
clusters to be used. We should mention that a previous 
attempt to infer the couplings in the (sparse) Hopfield 
model from data collected in a single state was done in 
EM). However, in that work, the interaction network was 
assumed to be known and only couplings intensities were 


inferred, so a direct comparison with our results is not 
possible. 


II. PROBLEM DEFINITION AND INFERENCE 
ALGORITHMS 

In the static inverse Ising problem one aims at infer¬ 
ring the value of the couplings between the variables and 
the eventual magnetic fields, given a set of M equilib¬ 
rium configurations. More precisely we consider an Ising 
model with N spins defined by the Hamiltonian 

*?^(s) — ^ ( Jij SiSj ^ ( hiSi , (1) 

i<j i 

where i,j = In the static case, the inference 

process is done by using input data distributed according 
to the Gibbs-Boltzmann measure 

p -PH(s) 

-Pgb(s) = — 7 F— where Z = ^ e -P n ^) (2) 

S 

We remind here that the M sampled configurations are 
assumed to be independent. 

In the following we will consider two different families 
of inference methods. For mean-field methods, we shall 
consider the average magnetizations and correlations of 
the data 
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These observables are the only information needed to in¬ 
fer the parameters of the models when using mean-field 
methods. We will also consider the pseudo-likelihood 
methods for which the entire sampled configurations {s“} 
are needed. Let us now describe how these methods work 
and how we will used them in the context of a clustered 
phase space. 

We first consider the naive mean-field approach where 
the equations can be simply derived by considering the 
solution of the Curie-Weiss model (where J l:j = 1/iV). 
For this model, the magnetisations and the correlations 
are given by 


m. 


= tanh I /3 + /3hi 
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By inverting eq. ([6]) we can reconstruct directly the cou¬ 
plings J*y Then, by using the J* and eq. (|5j) we can infer 
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the magnetic fields h* 

Jij = — (c 1 )ij + 

K = r 1 


/3(1 — mf) 


atanh(m l ) — Jij m j 


(7) 

( 8 ) 


We refer to this method as nMF in the rest of the article. 

A second approximation commonly used is to consider 
the pseudo-likelihood method (PLM). PLM is based on 
the maximization of the marginals probability of one spin 
Si given that the rest of the spins are fixed: p(si\sj\i). 
The PLM consists in maximizing the sum of all the log- 
pseudo-likelihood PS EZ] 


vc 


i 

NM 


5>g(p«Ky) (9) 


In this method, we need to have access to all the config¬ 
urations {s“}. The advantage of this method is that it 
deals also with high order correlations and thus provides 
much better performances on finite dimensional systems 
PSPS, but it also can handle directly clustered phase 
space. Moreover it has a polynomial complexity at vari¬ 
ance to using the true likelihood of the data. 


A. Clustering methods and inference with 
clustered phase space 


the number of clusters. Thus this second clustering algo¬ 
rithm has the advantage of finding by itself the number of 
clusters. It suffers however of a larger complexity, scaling 
as 0(M 2 ). 

After clustering the configurations we have to use them 
properly to infer the parameters of the model. We define 
the observables of the fc th cluster by 


- (fe) _ 1 

TOi ~ M k £ ' 
k aeCfc 
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where Ck is the set of indices of configurations belonging 
to the fc-th cluster and M k = \Ck\- We now apply the 
nMF equations separately for each cluster and obtain a 
different estimate of the parameters for each cluster Jb ’. 
Finally, to obtain the best estimate for the couplings we 
take the weighted average of all the different estimates 


1 * = 


1 

M 




( 12 ) 


To estimate the magnetic field, we first compute them 
within each cluster: Ilf 1 is obtained from eq. (|8| with 
the estimates Jb '. The final estimate for the magnetic 
fields is again given by the weighted average over the 
clusters 


Here we describe the clustering algorithms that we use 
to divide configurations in clusters before applying the 
nMF method. These clustering algorithms group con¬ 
figurations together based on their distances: configura¬ 
tions are put in the same group if they are “close” enough 
and “far” from the other clusters, where the concepts of 
“close” and “far” usually need to be determined in a self- 
consistent way. We use the Hamming distance defined by 
dab = 1/(4AT) J2i( s i ~ s i) 2 - the present work we use 
two different clustering methods. First we consider the 
soft iF-means clustering [20]. This method clusterizes the 
space of configurations by assigning each configuration 
to the closest of the k centers “softly” (a configuration 
is assign to a center with a given probability). Then the 
position of the k centers is updated accordingly to the 
position of the configurations inside each cluster. The 
procedure is repeated until convergence. This method is 
very fast, the complexity scale as 0(M), but the results 
can depend strongly on the initial conditions (i.e. on how 
the k centers are chosen at the beginning). 

A second method is based on density clustering. The 
density clustering algorithm we consider [2T] first defines 
the density around each point. In our case the density 
is the number of configurations within a given range. 
Then, each data point is associated to its closest neighbor 
with higher density. This process naturally separates the 
phase space into a number of clusters which depend on 
the range used for defining the neighborhoods. There¬ 
fore by using this algorithm we do not need to specify 


h* = — V M k K 
1 M k 

k 


(k) 


(13) 


III. RESULTS ON THE CURIE-WEISS MODEL 


The Curie-Weiss (CW) model is a fully connected fer- 
romagnet with ,J t j = 1/A, Vi yf j. The model has a para¬ 
magnetic phase (mi = 0) at high temperature /3 < /3 C = 1 
and a ferromagnetic phase (to; yf 0) above /3 C . In the 
ferromagnetic phase, two states of positive and negative 
magnetizations coexist. In the limit of very large system 
sizes (N —>■ oo) magnetizations and correlations can be 
computed analytically by eqs. (5][6), which are exact up 
to 0(1/N) corrections. It means that, by using eqs. ffl 
one should obtain the best possible estimate of the pa¬ 
rameters Jij and hi, but in the ferromagnetic phase, only 
the solution with non-zero magnetization of the eq. (p]) 
should be considered (as discussed in the Introduction). 
We evaluate now how the following three inference algo¬ 
rithm perform in the estimate of couplings in the CW 
model: (i) the nMF method used naively, without clus¬ 
tering the configurations; (ii) the nMF method on con¬ 
figurations clustered using two clusters; (iii) the PLM on 
the original configurations. 

In Fig. [l] we report the error achieved by different 
methods in the temperature range /3 £ [0.1,2] with 
M = 10 4 and M = 10 5 in inferring the couplings us¬ 
ing the following definition 
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FIG. 1. Inference of couplings in the CW model with N = 100 
and two different values of the number M of input configura¬ 
tions. We see that the nMF method with all the input data 
is good only for /3 < f) c = 1 - For (3 > (3 C the phase space 
separates in 2 states and the nMF method with 2 clusters 
give much better performance (although it fails badly at high 
temperature). Inference methods, like PLM and nMF with 
density clustering, that take correctly into account the clus¬ 
tering of input configurations provide the best estimate in the 
entire temperature range, both above and below the transi¬ 
tion temperature. In the left inset, we show how the inferred 
magnetic field at /? = 1.6 decreases when increasing the num¬ 
ber of samples (M € [10 3 , 10 6 ]) used for the inference process 
via nMF with if-means clustering and K = 2. In the right 
inset the same inferred magnetic held is plotted versus /3, for 
M = 10 4 ,10 5 . 


< I4 > 

For P < /3 C the paramagnetic fixed point is correct 
and therefore the reconstruction achieve by nMF is the 
best possible. However, for /? > f3 c the nMF error (red 
curves) suddenly raises, because the rrii = 0 fixed point 
is no longer the physical one. On the contrary, using 
the nMF method on the data clustered with exactly 2 
clusters (green curves), provides a small error in the fer¬ 
romagnetic phase, but fails badly in the paramagnetic 
phase. The inference methods that provide the best esti¬ 
mate in the whole temperature range are the PLM (blue 
curves) and the nMF with data clustered via density clus¬ 
tering (purple curve), that automatically split the input 
data in one or two clusters, depending on symmetries in 
the input data. It is worth stressing that these two meth¬ 
ods have essentially the same error at any temperature: 
that is even the nMF approximation provides the best 
possible estimates if applied to properly clustered data. 

In Fig. [l] we show results obtained with M = 10 4 and 
M = 10 5 in order to make evident whether the uncertain¬ 
ties in the couplings estimates are due to the noise in the 
input data or to an intrinsic limitation of the inference 


algorithm. For example deep in the ferromagnetic phase 
the nMF method has an error decreasing only slightly 
when M increases, because the error is mainly due to 
a limitation of the method. On the contrary, PLM and 
nMF with properly clustered data provide a result whose 
uncertainty is mainly due to noise in the input data: in¬ 
deed the error decreases as l/y/M. 

To confirm the correctness of the inference algorithm 
based on data clustering and nMF equations, we also 
looked a the inferred value of the magnetic field by using 
eqs. ^ and (13). We see clearly in the insets of Fig. [I] 
that, in the low temperature phase, the clustering+nMF 
method does not predict any anomalously large magnetic 
field, thanks to the fact that, clustering the input data, 
we are actually using the magnetized solutions of eq. (|8| . 
In our numerical experiments, we have found too large 
inferred magnetic fields only if either system size was too 
small or the input data were too noisy: in the former 
case the problem resides in the fact eq. ([8]) is crudely 
approximate, while in the latter case it is a consequence 
of large errors in couplings reconstruction. 


IV. RESULTS ON THE HOPFIELD MODEL 

We now extend our analysis to a more complicated case 
by considering the Hopfield model. The Hopfield model 
has been introduced long time ago [22j to model neu¬ 
ral networks: it is a fully-connected Ising model, whose 
couplings can be chosen such that the model free-energy 
has 2 P different minima (that act has attractors for the 
pattern recovery dynamics). In some sense, the Hopfield 
model can be seen as a generalization of the Curie-Weiss 
model, which is indeed equivalent to the P = 1 case. 
We are interested in studying the inverse Ising problem 
in the Hopfield model, because configurations sampled at 
low temperature in the Hopfield model are typically clus¬ 
tered around the 2 P free-energy minima: consequently 
naive MF methods face even more severe limitations than 
in the low temperature phase of the CW model, and we 
want to study how much MF methods for the inverse 
Ising problem can be improved by clustering input con¬ 
figurations. 

The Hamiltonian of the Hopfield model reads 

= < 15 > 

ij a =1 

where the P patterns £“ identify the directions of the 
free-energy minima. In the standard Hopfield model, 
the £s are drawn from the bimodal distribution, that is 
= ±1 with probability 1/2 independently. In our 
study we also consider the case where the pattern £ 
are correlated by setting 10% of their components equal 
(£“ = Va, /3), and anti-correlated (only when P = 2) 
by setting 10% of their components in an opposite way 
(£ 4 = This model presents a paramagnetic phase 



















5 





FIG. 2. Main panels: errors in inferring couplings in Hopfield 
models with P = 2 uncorrelated (top), correlated (center) 
and anti-correlated (bottom) patterns. The comparison is 
between MF methods with clustered data (either A-means or 
density clustering) and PLM. In the inset of the top panel, 
we show that the likelihood of the clustering algorithm sug¬ 
gests to take one cluster below j3 c « 1.1 and 4 clusters above 
/ 3 C . In the inset of the bottom panel we show the magnetic 
field inferred by nMF-(-clustering, which is very small in both 
phases. 


at high temperature, and an ordered phase at low tem- 



FIG. 3. Errors on inferring couplings in the Hopfield model 
with 3 patterns (and thus 6 minima). We observe again that 
our algorithm, based on MF methods applied to clustered 
data, achieves its best performance when input data are split 
in 6 clusters. We also put for comparison the results obtained 
when the clustering is done many times with different initial 
conditions (label ‘many IC’) and then we picked the clustering 
having the largest likelihood. In this case, the error matches 
the error obtained when putting each configuration in the 
correct cluster. We can see that our method performs its 
best at almost any (3 value, but at few points where it is 
particularly difficult to find the best clustering. In the inset 
we see that likelihood maximization suggests to use 1 cluster 
for (3 < (3 c and 6 clusters for /3 > f3 c . 


perature defined by the states around the patterns {£} if 
the number of patterns is not too high [23]. The ordered 
phase is characterized by a Gibbs-Boltzmann measure 
clustered around one of the 2 P available states (for a 
given P there will be 2 P stable states in the low temper¬ 
ature region due to the spin flip symmetry). 

We show now our results on inferring the Hopfield cou¬ 
plings by using MF methods on clustered data. In Fig. [2] 
we consider systems with N = 100 spins, P = 2 (there¬ 
fore 4 states) in all the three possible cases (standard, cor¬ 
related and anti-correlated patterns). We observed that 
MF methods with the right number of clusters perform 
similarly to the PLM, which is at present the best pos¬ 
sible algorithm to solve the inverse Ising problem. The 
right number of clusters can be obtained either by density 
clustering or by maximizing the likelihood of the cluster¬ 
ing obtained by A'-means (see panel (a) inset in Fig. [2]). 

As in the CW model, also for the Hopfield model the 
magnetic fields inferred by MF methods on clustered data 
are very small, and independent on the eventual long 
range order present in the model (see inset in panel (c) 
of Fig. [2]). 

In Fig. [3] we show the results on inferring couplings 
of Hopfield models with P = 3 patterns (and thus 6 
free-energy minima). Again MF methods applied on 
input data clustered with the right number of clusters 
perform very similarly to PLM, and much better than 
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standard MF methods applied directly to all input data. 
It is worth noticing that the best result by the cluster- 
ing+nMF algorithm as been obtained by running the 
clustering procedure several times with different initial 
conditions (data labeled ’many IC’ Fig. 3]) and then pick¬ 
ing the clustering having the largest likelihood. This is 
expected since a clustering algorithm as A'-means is not 
very stable for large K and its outcome strongly depends 
on the initial condition. 

Let us finally discuss the time complexity of the three 
algorithms we have used: PLM, A'-means+nMF and 
dens.clus.+nMF. Regarding the system size dependence, 
all three algorithms have a time complexity 0(N 3 4 5 6 ), ei¬ 
ther because of the inversion of a N x N matrix in nMF 
methods, either because of the computation of the gra¬ 
dient of the pseudo-likelihood (PL), which is O(N), in 
a space 1 of 0(N 2 ) variables. Their dependence on the 
number M of input configurations is different: PLM is 
linear in M , but the search for the maximum of the PL, 
requires to compute PL and its derivatives many times; 
A-means is linear in M, but often a search for the op¬ 
timal clustering requires to run it with many different 
initial conditions; density clustering is 0(M 2 ), so, al¬ 
though it provides a robust result, it is impracticable 
when the number of samples is very high (however we are 
aware that the authors of Ref. pi] are developing a faster 
version of the density clustering algorithm). Therefore, 
nMF methods are always faster with a total complexity 
of 0(KMN + N 3 ) whereas PLM is 0(MN 3 ). In prac¬ 
tice, we observe it is better to use PLM when the number 
M of input configurations is small since it gives in general 
better estimates of the reconstructed couplings. When M 
becomes large, nMF with A-means clustering is clearly 
recommended since PLM would be affected by the large 
number of samples. 


V. CONCLUSIONS 

In this work we have presented a very simple way 
to make mean field approximations to the inverse Ising 
problem effective also in the low temperature phase, 


where symmetries get usually broken and, correspond¬ 
ingly, input data get clustered. The idea is to cluster 
the input data and to apply mean-field methods to each 
data cluster. We have tested this clustering+nMF algo¬ 
rithm on the Curie-Weiss and Hopfield models, compar¬ 
ing results with the most sophisticated and state-of-the- 
art pseudo-likelihood method. 

Results are very promising and redeem mean-field ap¬ 
proximations to inverse problems, even in those cases 
where the structure of the input data is such that a 
straightforward application of mean-field methods would 
be ineffective. 

The natural follow-up to this work is application of 
clustering+nMF methods to inverse problems based on 
real data. It is worth remembering that often in solv¬ 
ing inverse problems based on real and noisy data, the 
robustness of simple MF methods is more valuable than 
the putative higher accuracy of more sophisticated meth¬ 
ods: see e.g. the case of inferring protein contacts [5]. It 
is also worth mentioning cases where the data can be 
naturally divided in two or more classes, exhibiting dif¬ 
ferent statistical properties, but this is usually not taken 
into account when estimating model parameters. For ex¬ 
ample Ref. jU presents an impressively detailed analy¬ 
sis of neuronal spiking patterns. Nonetheless the data 
belonging to two different regimes (quiescent and spik¬ 
ing) are merged together before doing the analysis ac¬ 
cording to mean-field approximation. The application of 
the method presented in this work is likely to improve 
inference and reduce errors. Finally, the numerous re¬ 
cent studies on pattern recognition using neural network 
might also benefit from an approach dealing with clus¬ 
ters. In those systems it is quite common to deal with 
many basins of attraction that are used to improve the 
neural network efficiency. Mean-field techniques would 
be more than welcome since methods such as PLM can¬ 
not deal with the large dataset size (particularly since 
the average over all samples has to be done at each step 
of the algorithm). 

From this point of view, enlarging the range of appli¬ 
cability of MF methods by data clustering is certainly 
very useful and maybe better than developing higher or¬ 
der approximations (that strongly depends on the model 
used to describe the data). 
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